11 research outputs found

    Influence mining from unstructured big data

    Get PDF
    A crucial component of any intelligent system is to understand and predict the behavior of its users. A correct model of user's behavior enables the system to perform effectively to better serve the user's need. While much work has been done on user behavior modeling based on historical activity data, little attention has been paid to how external factors influence the user behavior, which is clearly important for improving an intelligent system. The influence of external factors on user behavior is mostly reflected in two different ways: 1) through significant growth of users' thirst about information related to external factors (e.g., the user may conduct many searches related to a popular event or related to some community of interest), and 2) through user-generated content that are directly/indirectly related to the external factors (e.g. the user may tweet about a particular event). To capture these two aspects of user behavior, I introduce Influence Models for both Information Thirst and Content Generation, sequentially, in this thesis. To the best of my knowledge, Influence models for Information Thirst and Content Generation have not been studied before. The thesis starts with the introduction of a new data mining problem, i.e., how to mine the influence of real world events on users' information thirst, which is important both for social science research and for designing better search engines for users. I solve this mining problem by proposing computational measures that quantify the influence of an event on a query to identify triggered queries and then, proposing a novel extension of Hawkes process to model the evolutionary trend of the influence of an event on search queries. Evaluation results using news articles and search log data show that the proposed approach is effective for identification of queries triggered by events reported in news articles and characterization of the influence trend over time. This influence model assumes that each event poses its influence independently. This assumption is unrealistic as there are many correlated events in the real world which influence each other and thus, would influence the user search behavior jointly rather than independently. To relax this assumption, in the next part of my thesis, I propose a Joint Influence Model based on the Multivariate Hawkes Process which captures the interdependence among multiple events in terms of their influence. Experimental study shows that the Joint Influence Model achieves higher accuracy than the independent model. The second way to observe external influence on user behavior is to analyze user-generated content that is directly/indirectly related to those external factors, which I discuss in the last part of the thesis. For example, user-generated content is often significantly influenced by the community to which the user belongs to. While some work has been done on mining such influence from structured information networks, little attention has been paid on how to mine community-influence from user generated unstructured data. To study such influence, I introduce the problem of mining community-influence from user-generated unstructured contents, particularly in the context of text content generation. Although text generation has recently become a popular research topic after the surge of deep learning techniques, existing methods do not consider community-influence factor into the generation process and thus, the processes do not evolve over time. This clearly limits their application on text stream data as most text stream data often evolve over time showing distinct patterns corresponding to the shifting interests of the target community. To address this limitation, I propose an Influenced Text Generation (ITG) Process that can capture this evolution of text generation process corresponding to evolving community-influence over time. ITG is based on deep learning architecture and uses LSTM cells within the hidden layers of a recurrent neural network. Experimental results with six independent text stream data comprised of conference paper titles show that the proposed ITG method is really effective in capturing the influences of different research communities on paper titles generated by the researchers

    TELeR: A General Taxonomy of LLM Prompts for Benchmarking Complex Tasks

    Full text link
    While LLMs have shown great success in understanding and generating text in traditional conversational settings, their potential for performing ill-defined complex tasks is largely under-studied. Indeed, we are yet to conduct comprehensive benchmarking studies with multiple LLMs that are exclusively focused on a complex task. However, conducting such benchmarking studies is challenging because of the large variations in LLMs' performance when different prompt types/styles are used and different degrees of detail are provided in the prompts. To address this issue, the paper proposes a general taxonomy that can be used to design prompts with specific properties in order to perform a wide range of complex tasks. This taxonomy will allow future benchmarking studies to report the specific categories of prompts used as part of the study, enabling meaningful comparisons across different studies. Also, by establishing a common standard through this taxonomy, researchers will be able to draw more accurate conclusions about LLMs' performance on a specific complex task

    Joint Upper & Lower Bound Normalization for IR Evaluation

    Full text link
    In this paper, we present a novel perspective towards IR evaluation by proposing a new family of evaluation metrics where the existing popular metrics (e.g., nDCG, MAP) are customized by introducing a query-specific lower-bound (LB) normalization term. While original nDCG, MAP etc. metrics are normalized in terms of their upper bounds based on an ideal ranked list, a corresponding LB normalization for them has not yet been studied. Specifically, we introduce two different variants of the proposed LB normalization, where the lower bound is estimated from a randomized ranking of the corresponding documents present in the evaluation set. We next conducted two case-studies by instantiating the new framework for two popular IR evaluation metric (with two variants, e.g., DCG_UL_V1,2 and MSP_UL_V1,2 ) and then comparing against the traditional metric without the proposed LB normalization. Experiments on two different data-sets with eight Learning-to-Rank (LETOR) methods demonstrate the following properties of the new LB normalized metric: 1) Statistically significant differences (between two methods) in terms of original metric no longer remain statistically significant in terms of Upper Lower (UL) Bound normalized version and vice-versa, especially for uninformative query-sets. 2) When compared against the original metric, our proposed UL normalized metrics demonstrate higher Discriminatory Power and better Consistency across different data-sets. These findings suggest that the IR community should consider UL normalization seriously when computing nDCG and MAP and more in-depth study of UL normalization for general IR evaluation is warranted.Comment: 26 pages, 3 figure

    FaNS: a Facet-based Narrative Similarity Metric

    Full text link
    Similar Narrative Retrieval is a crucial task since narratives are essential for explaining and understanding events, and multiple related narratives often help to create a holistic view of the event of interest. To accurately identify semantically similar narratives, this paper proposes a novel narrative similarity metric called Facet-based Narrative Similarity (FaNS), based on the classic 5W1H facets (Who, What, When, Where, Why, and How), which are extracted by leveraging the state-of-the-art Large Language Models (LLMs). Unlike existing similarity metrics that only focus on overall lexical/semantic match, FaNS provides a more granular matching along six different facets independently and then combines them. To evaluate FaNS, we created a comprehensive dataset by collecting narratives from AllSides, a third-party news portal. Experimental results demonstrate that the FaNS metric exhibits a higher correlation (37\% higher) than traditional text similarity metrics that directly measure the lexical/semantic match between narratives, demonstrating its effectiveness in comparing the finer details between a pair of narratives

    Redundancy Aware Multi-Reference Based Gainwise Evaluation of Extractive Summarization

    Full text link
    While very popular for evaluating extractive summarization task, the ROUGE metric has long been criticized for its lack of semantic awareness and its ignorance about the ranking quality of the summarizer. Thanks to previous research that has addressed these issues by proposing a gain-based automated metric called Sem-nCG, which is both rank and semantic aware. However, Sem-nCG does not consider the amount of redundancy present in a model-generated summary and currently does not support evaluation with multiple reference summaries. Unfortunately, addressing both these limitations simultaneously is not trivial. Therefore, in this paper, we propose a redundancy-aware Sem-nCG metric and demonstrate how this new metric can be used to evaluate model summaries against multiple references. We also explore different ways of incorporating redundancy into the original metric through extensive experiments. Experimental results demonstrate that the new redundancy-aware metric exhibits a higher correlation with human judgments than the original Sem-nCG metric for both single and multiple reference scenarios

    On Evaluation of Bangla Word Analogies

    Full text link
    This paper presents a high-quality dataset for evaluating the quality of Bangla word embeddings, which is a fundamental task in the field of Natural Language Processing (NLP). Despite being the 7th most-spoken language in the world, Bangla is a low-resource language and popular NLP models fail to perform well. Developing a reliable evaluation test set for Bangla word embeddings are crucial for benchmarking and guiding future research. We provide a Mikolov-style word analogy evaluation set specifically for Bangla, with a sample size of 16678, as well as a translated and curated version of the Mikolov dataset, which contains 10594 samples for cross-lingual research. Our experiments with different state-of-the-art embedding models reveal that Bangla has its own unique characteristics, and current embeddings for Bangla still struggle to achieve high accuracy on both datasets. We suggest that future research should focus on training models with larger datasets and considering the unique morphological characteristics of Bangla. This study represents the first step towards building a reliable NLP system for the Bangla language1

    Exploring Challenges of Deploying BERT-based NLP Models in Resource-Constrained Embedded Devices

    Full text link
    BERT-based neural architectures have established themselves as popular state-of-the-art baselines for many downstream NLP tasks. However, these architectures are data-hungry and consume a lot of memory and energy, often hindering their deployment in many real-time, resource-constrained applications. Existing lighter versions of BERT (eg. DistilBERT and TinyBERT) often cannot perform well on complex NLP tasks. More importantly, from a designer's perspective, it is unclear what is the "right" BERT-based architecture to use for a given NLP task that can strike the optimal trade-off between the resources available and the minimum accuracy desired by the end user. System engineers have to spend a lot of time conducting trial-and-error experiments to find a suitable answer to this question. This paper presents an exploratory study of BERT-based models under different resource constraints and accuracy budgets to derive empirical observations about this resource/accuracy trade-offs. Our findings can help designers to make informed choices among alternative BERT-based architectures for embedded systems, thus saving significant development time and effort
    corecore